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Introduction 

The  well-established  breast  cancer  risk  factors  may  account  for  only  47%  of  the  breast  cancer 
incidence  in  the  United  States.  This  leaves  a  considerable  portion  of  breast  cancer  from 
undetermined  origin.  This  project  is  investigating  the  potential  that  environmental  estrogens 
may  be  involved  in  the  etiology  of  breast  cancer.  We  hypothesize  that  specific  features  of 
chemicals  can  be  identified  that  are  significantly  associated  with  female  and  breast  carcinogens 
and  that  these  features  are  related  to  mechanisms  of  chemical  carcinogenesis.  Our  overall 
scientific  objective  is  to  investigate  the  hypothesized  relationship  between  environmental 
chemicals,  xenoestrogens,  and  the  developmental  of  breast  cancer. 

Body 

As  reported  in  the  first  Annual  Report,  after  working  on  this  project  for  approximately  six 
months  at  the  University  of  Pittsburgh,  I  moved  to  the  Department  of  Environmental  Studies  at 
Louisiana  State  University  (LSU).  The  project  has  been  successfully  renegotiated  at  LSU.  The 
budget  has  been  redone  with  some  extra  time  added  as  a  no-cost  extension  to  help  make  up  for 
lost  time.  The  project  recommenced  in  the  fall  of  2002  at  LSU.  Therefore,  although  this  is  the 
second  Annual  Report,  in  actuality  it  basically  covers  the  first  whole  year  of  work  on  the  project. 

Software  Change 

The  structure-activity  relationship  (SAR)  modeling  was  originally  proposed  to  be  conducted 
with  the  MCASE  program.  However,  for  multiple  reasons,  I  have  decided  to  switch  platforms  to 
Tripos  Sybyl.  This  change  does  not  alter  the  project  and  I  am  currently  working  with  my  grants 
manager,  Dr.  Moore,  to  update  the  SOW. 

During  the  early  part  of  this  year  it  was  becoming  evident  that  MCASE  was  not  developing 
models  for  this  project  (details  discussed  below)  that  were  of  stellar  predictivity.  On  account  of 
successful  SIMCA  modeling  of  aromatic  amine  Salmonella  mutagens  and  skin  sensitizing  agents 
for  a  project  supported  by  Proctor  &  Gamble,  we  spent  some  time  investigating  whether  Sybyl 
could  be  employed  to  produce  adequate  models  relating  to  this  project. 

Briefly,  although  Sybyl  and  MCASE  are  different  modeling  packages,  the  Sybyl  family  of  SAR 
modules  allows  for  a  similar  type  of  analysis  of  toxicants.  As  described  in  the  proposal,  MCASE 
takes  a  binary  approach  to  analyzing  toxicants  by  comparing  structural  features  (2-dimensional 
biophores)  found  in  active  and  inactive  compounds.  Similarly,  for  the  Sybyl  analyses  the 
HQSAR  (hologram  QSAR)  program  calculates  2-dimensional  holograms(i.e.,  linear  fragments 
comparable  to  biophores)  and  the  Advanced  QSAR  module  uses  the  soft  independent  modeling 
of  class  analogy  (SIMCA)  algorithm  to  perform  statistical  analysis  of  the  holograms.  SIMCA  is 
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a  regression-type  analysis  that  develops  predictive  models  based  on  categorical  data  (i.e., 
carcinogens  and  noncarcinogens).  Overall,  the  HQSAR-SIMCA  models  appear  to  be  superior  to 
MCASE  models. 

Our  initial  HQSAR-SIMCA  included  an  analysis  of  potential  carcinogens  identified  using  ~1600 
diverse  Salmonella  mutagens  and  122  compounds  tested  for  estrogenicity  using  the  E-SCREEN 
assay  for  which  MCASE  models  existed.  The  mutagenicity  MCASE  model  and  its  analysis  has 
been  published  (I,  2)  and  the  E-SCREEN  environmental  estrogen  model  has  been  accepted  for 
publication  with  minor  revisions  (3).  In  both  instances,  Sybyl  has  been  able  to  develop  models 
comparable  to  the  predictivity  of  MCASE.  We  anticipate  a  manuscript  from  this  work 
describing  how  HQSAR-SIMCA  can  be  successfully  employed  for  computational  analysis  and 
prediction  of  environmental  toxicants. 

At  this  juncture,  all  projects  in  my  laboratory  are  being  transferred  to  Sybyl.  Needless  to  say. 
Tripos  Sybyl  software  is,  in  my  estimation,  superior  to  MCASE.  Notably,  the  Tripos  family  of 
software  is  growing  and  is  on  the  cutting  edge  of  technology.  The  literature  is  replete  with 
Sybyl-based  investigations.  On  the  other  hand,  MCASE  is  owned  by  an  individual  and  has  the 
real  potential  to  become  a  legacy  or  obsolete  system  in  the  coming  years. 

Specific  Aim  Accomplishments 

The  Specific  Aims  for  year  one  are  as  follows: 

Specific  aim  1 :  Development  and  validation  of  SAR  models  for  female  breast  carcinogens 
(months  1-12). 

a.  Identify  chemicals  tested  in  female  rodents  from  the  Carcinogenic  Potency  Database  and 
the  National  Toxicology  Program  (month  1). 

b.  Enter  chemical  structures  and  potency  values  into  MCASE  program  (months  2-8). 

c.  Validate  models  using  10-fold  cross  validation  (months  9-12) 

d.  Summarize  and  interpret  models  and  prepare  publication. 

These  models  have  been  developed  and  validated  (i.e.,  a-c)  as  planned.  Moreover,  in 
conjunction  with  this,  we  have  also  taken  the  privilege  to  update  our  existing  rodent 
carcinogenicity  models  so  that  all  models  (mouse  and  rat,  as  well  and  female  specific  version) 
have  been  built  on  the  same  datasets  and  analyzed  with  the  same  software  version. 

However,  in  summarizing  and  interpreting  (i.e.,  d)  it  became  evident  that  the  models  were  not 
performing  as  well  as  anticipated  (Table  1).  Through  a  series  of  10-fold  cross-validations  we 
calculated  sensitivity  (%  carcinogens  correctly  predicted),  specificity  (%  noncarcinogens 
accurately  predicted)  and  the  overall  observed  correct  prediction  (OCP)  rate  for  each  model. 
Looking  at  the  shaded  rows  in  Table  1,  it  is  evident  that  the  models  developed  generally  had  a 
low  sensitivity  and  thus  were  not  able  to  accurately  predict  carcinogens. 

A  common  problem  encountered  with  SAR  model  development  and  validation  is  that  the  model 
can  only  be  validated  against  existing  data.  Additionally,  data  used  in  the  learning  set  are  rarely 
a  complete  and  random  sample  of  the  universe  of  chemical  features.  Therefore,  when  using  a 
SAR  model  to  predict  the  activity  of  a  novel  chemical,  uncertainty  of  the  predictive  ability  of  the 
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model  exists  when  the  chemical  falls  outside  the  sample  space  of  the  learning  set.  This  problem 
manifests  itself  readily  in  MCASE  during  validation  studies,  particularly  when  the  learning  set  is 
relatively  small  and  multiple  mechanisms  are  involved  in  the  measured  endpoint.  Chemicals  that 
contain  a  unique  biophore  (i.e.,  a  feature  represented  only  in  the  chemicals  that  are  deleted  from 
the  learning  set  and  placed  in  the  validation  set)  are  not  accurately  predicted  since  the  model 
loses  the  chemical’s  unique  informational  contribution.  These  chemicals  fall  outside  the  sample 
space  of  the  remaining  model  (i.e.,  outliers).  This,  in  turn,  lowers  the  sensitivity  and 
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concordance  of  the  model  (specificity  remains  the  same).  In  order  to  assess  this  limitation  in  the 
validation  procedure  of  MCASE  and  its  derived  model,  all  of  the  active  chemicals  that  were 
identified  by  a  unique  biophore  were  removed  from  the  overall  validation  set  for  a  modified 
validation  study.  This  procedure  allows  for  the  validation  of  a  more  robust  model.  These 
modified  validation  sets  had  an  increase  in  sensitivity  for  rats  from  55%  to  65%  and  an  increase 
for  mice  from  56%  to  74%  (Table  1). 


Not  surprisingly,  the  SAR  model  performance  is  enhanced  by  the  removal  of  these  single 
occurrence  biophore  chemicals.  However,  given  the  fact  that  some  of  the  female  specific  models 
had  sensitivities  well  below  50%,  I  began  to  question  the  ability  of  this  type  of  outlier  analysis  to 
produce  meaningful  models  of  female-specific  carcinogenesis.  Therefore,  models  were 
generated  using  Sybyl. 

Female  Carcinogen  Models 

Specific  Aim  la  is  for  the  creation  of  female  specific  models.  As  discussed,  these  models  have 
been  developed  for  MCASE.  We  are  currently  transferring  them  to  Sybyl  for  HQSAR-SIMCA 
analysis. 

Mammary  Carcinogen  Models 

Although  rodent  and  female  specific  carcinogen  models  are  important  models  for  this  project, 
the  hallmark  models  are  those  developed  for  mammary  carcinogens.  Therefore,  we  thought  it 
prudent  to  verify  that  Sybyl  HQSAR-SIMCA  would  be  capable  of  producing  adequate  models  of 
breast  carcinogens.  We  have  established  two  HQSAR-SIMCA  models  for  mammary 
carcinogenesis,  one  each  for  rat  and  mouse  mammary  carcinogens  from  the  Carcinogenic 
Potency  Database  (CPDB)  as  part  of  Specific  Aim  1.  The  mouse  model,  based  on  48 
compounds  (50%  mammary  carcinogens  and  50%  noncarcinogens)  is  estimated  to  be 
approximately  81%  predictive  through  cross-validation.  The  rat  model,  based  on  200 
compounds,  is  approximately  77%  predictive. 

These  models  are  based  on  all  mammary  carcinogens  in  the  CPDB  including  several  male-only 
breast  carcinogens.  We  will  shortly  be  producing  a  model  of  female-only  breast  carcinogens. 
Also,  as  mentioned,  the  modeling  technique  compares  carcinogens  to  noncarcinogens.  In  these 
mammary  carcinogens  models  we  compared  mammary  carcinogens  to  compounds  that  were  not 
carcinogens  in  both  mice  and  rats.  This  is  perceived  as  the  widest  possible  separation  of  the 
breast  carcinogen-noncarcinogen  classes. 

Response  to  Technical  Issue  Raised  in  Year  1  Review 

In  the  first  Annual  Review  we  indicated  the  creation  of  searchable  databases  for  NTP  and  CPDB 
data.  These  databases  consist  of  compiling  the  data  from  the  CPDB  and  NTP  into  Excel 
spreadsheets  that  are  easily  viewed  in  order  to  identify  compounds  of  particular  interest  (i.e., 
female-specific  carcinogens).  Although  easily  viewable,  they  have  proven  to  be  tedious  and 
problematic  for  the  creation  of  learning  sets.  To  remedy  this,  we  have  developed  a  SAS  routine 
for  searching  the  data  in  its  original  format.  We  now  have  a  more  complete  tool  for  learning  set 
creation  (SAS  searching  and  Excel-based  viewing  of  the  data). 
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We  also  plan  to  shortly  implement  a  routine  to  link  the  search  results  from  SAS  to  a  library  of 
chemical  structures  for  CPDB  and  NTP  compounds.  This  will  allow  us  to  rapidly  search  for 
compounds  with  particular  interest  and  instantly  create  files  containing  their  chemical  structures 
for  SAR  analysis  in  Sybyl.  We  think  this  will  be  of  sufficient  utility  that  we  plan  offer  it  to  the 
NTP  and  CPDB  or  minimally  provide  it  on  our  website. 

Additionally,  an  added  benefit  to  migrating  from  MCASE  to  Sybyl  is  that  Sybyl  allows  the  user 
to  produce  learning  sets  (i.e.,  databases  of  chemicals,  names,  and  toxicological  data)  that  are 
easily  transportable  between  users  and  other  systems.  We  therefore  also  plan  to  make  publicly 
available  on  our  website  each  of  the  specific  model  learning  sets  (e.g.,  mammary  and  female 
carcinogens  learning  sets. 

Key  Research  Accomplishments 

Development  of  female  and  mammary  carcinogen  models  in  MCASE 
Development  of  mouse  and  rat  mammary  carcinogen  models  in  Sybyl  HQSAR-S1MCA 
Ascertainment  that  Sybyl  is  an  adequate  replacement  for  MCASE 

Reportable  Outcomes 

Seminar  “Structure-activity  relationships:  Estrogen  mimics  and  endocrine  disruptors”  at  LSU 
Environmental  Lecture  Series  at  Tulane  University 

Conclusions 

With  the  success  of  the  first  mammary  carcinogen  models  we  anticipate  publishing  these  results 
in  the  near  future  (a  proposed  deliverable).  If  successful  at  modeling  the  general  female 
carcinogens,  another  manuscript  will  be  produced  describing  female-specific  carcinogens 
(another  deliverable). 

To  date,  after  technically  about  one  year  of  work  we  have  developed  the  proposed  models  set 
forth  in  Specific  Aim  1  using  MCASE.  I  estimate  that  we  may  be  about  one  month  behind 
schedule  due  to  switching  to  Sybyl  HQSAR-SIMCA.  Moreover,  in  conjunction  with  this  and 
other  projects  in  my  laboratory,  all  the  required  components  for  Specific  Aim  2  are  being  moved 
from  MCASE  to  Sybyl  so  there  should  be  no  delay  or  problems  accomplishing  the  tasks  of 
Specific  Aim  2.  This  is  of  particular  relevance  for  Specific  Aims  2a  and  2b  that  require  other 
relevant  (e.g.,  mutagenicity  and  estrogenicity)  toxicological  models  on  which  to  compare  the 
female  and  mammary  gland  carcinogen  models. 

Looking  forward,  I  see  no  obstacles  to  the  successful  completion  of  this  project  in  a  timely 
manner.  By  switching  to  Sybyl,  we  envision  being  able  to  more  accurately  and  thoroughly 
investigate  the  chemical  structural  attributes  of  breast  carcinogens. 
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